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Abstract 

An algorithm is described for tagging the flavour content at production of neutral 
B mesons in the LHCb experiment. The algorithm exploits the correlation of the 
flavour of a B meson with the charge of a reconstructed secondary charm hadron 
from the decay of the other b hadron produced in the proton-proton collision. Charm 
hadron candidates are identified in a number of fully or partially reconstructed 
Cabibbo-favoured decay modes. The algorithm is calibrated on the self-tagged decay 
modes B + -» J/^K + and B° —> J/ipK*° using 3.0fb _1 of data collected by the 
LHCb experiment at pp centre-of-mass energies of 7 TeV and 8 TeV. Its tagging 
power on these samples of B —> J/0 X decays is (0.30 ± 0.01 ± 0.01)%. 


Submitted to J. Instrum. 

© CERN on behalf of the LHCb collaboration, licence CC-BY-4.0 


1 Authors are listed at the end of this paper. 




11 



1 Introduction 


Measurements that involve mixing and time-dependent CP asymmetries in decays of 
neutral B mesons require the identification of their flavour content at production. This is 
achieved via various flavour tagging algorithms that exploit information from the rest of 
the pp collision event. Same-side (SS) taggers look for particles produced in association 
with the signal B meson during the hadronization of the b quark |i|. The d or s partner 
of the light valence quark of the signal B has a roughly 50% chance of hadronizing into a 
charged pion or kaon. Since b quarks are mostly produced in bb pairs, the flavour content 
of the signal B meson can also be deduced from available information on the opposite-side 
(OS) b hadron, whose flavour is the opposite of the signal B meson at the production time. 
OS muon and electron taggers look for leptons originating from semileptonic b —» cW 
transitions of the b hadron, and an OS kaon tagger looks for kaons coming from b —y c —> s 
transitions. A vertex-charge tagger reconstructs the decay vertex of the OS b hadron 
and predicts its charge by weighting the charges of its decay products according to their 
transverse momentum. The OS taggers employed by LHCb are described in Ref. |2| and 
the SS taggers in Refs. |3|[4j. This paper reports a new flavour tagging algorithm for 
the LHCb experiment that relies on reconstructed decays of charm hadrons produced in 
the OS b hadron decay. For the development and evaluation of the tagging algorithm, 
signal B meson and charm hadron candidates are reconstructed using data from Sfb” 1 of 
integrated luminosity collected by LHCb at 7TeV and 8TeV centre-of-mass energies in 
2011 and 2012, respectively. 

The performance of a flavour tagging algorithm is defined by its tagging efficiency, e t ag, 
mistag fraction, co, and dilution, T> — 1 — 2u. For a simple tagging algorithm with discrete 
decisions - B°, B°, or untagged - these metrics are directly related to the numbers of 
rightly tagged (R), wrongly tagged ( W ), and untagged events (U) in a signal sample: 

R+W W R-W 

£tag = R + W + U 1 U = R + W 1 V = R + W' ^ 

The performance of the flavour tagging algorithms is improved by assigning confidence 
weights to their tagging decisions. For each tagger, a multivariate classifier is trained 
using simulated data to distinguish between correct and incorrect decisions |2|. The 
inputs to the classifier are a selection of kinematic and geometric quantities describing 
the tagging track(s), the signal B meson, and the event. This classifier then calculates a 
predicted mistag probability p for each decision made. The predicted mistag probability 
is calibrated to data using an appropriate flavour self-tagged mode, such as B + —» J/^K + , 
or a mode involving neutral B oscillation, which self-tags its flavour at the decay-time, 
such as R °—> J/ijjK* 0 or B° s — » D~n + [4,[5] (the use of charge-conjugate modes is implied 
throughout this paper). This calibration procedure provides a function oj(p), which relates 
the actual mistag probability c o to the predicted mistag probability p. Weighting each signal 
candidate by 1 — 2uj(p) leads to an improved effective mistag fraction u> and associated 
dilution V = 1 — 2u The statistical power of a CP asymmetry measurement using a 
tagging algorithm is proportional to the effective tagging efficiency (or tagging power) e e g, 


1 





defined as 


£ e ff — ^tag^ 2 - (2) 

The typical combined tagging power of the current set of OS tagging algorithms used by 
LHCb is approximately 2.5% |3]j6j-8|. Any augmentation to this tagging power increases 
the statistical precision achievable in CP measurements at LHCb. 


2 Detector and simulation 


The LHCb detector 19,10 is a single-arm forward spectrometer covering the pseudorapidity 
range between 2 and 5, designed for the study of particles containing b or c quarks. The 
detector includes a high-precision tracking system consisting of a silicon-strip vertex 
detector surrounding the pp interaction region 11 , a large-area silicon-strip detector 


located upstream of a dipole magnet with a bending power of about 4 Tm, and three 


stations of silicon-strip detectors and straw drift tubes 12 placed downstream of the 


magnet. The tracking system provides a measurement of momentum, p, of charged particles 
with a relative uncertainty that varies from 0.5% at low momentum to 1.0% at 200GeV/c. 
The minimum distance of a track to a primary vertex (PV), the impact parameter, is 
measured with a resolution of (15+29/pt) pni, where px is the component of the momentum 
transverse to the beam, in GeV/c. Different types of charged hadrons in the momentum 
range 2-100 GeV/c are distinguished using information from two ring-imaging Cherenkov 
detectors |13j. Photons, electrons and hadrons are identified by a calorimeter system 
consisting of scintillating-pad and preshower detectors, an electromagnetic calorimeter 
and a hadronic calorimeter. Muons are identified by a system composed of alternating 
layers of iron and multiwire proportional chambers 14 . The online event selection is 


performed by a trigger 15 , which consists of a hardware stage, based on information from 


the calorimeter and muon systems, followed by a software stage, which applies a full event 
reconstruction. 

In the simulation, pp collisions are generated using PYTHIA 
LHCb configuration 


16,17 


18 


with a specific 

Decays of hadronic particles are described by EvtGen |19|, 


in which final-state radiation is generated using PHOTOS 20 . The interaction of the 


generated particles with the detector, and its response, are implemented using the Geant 4 
toolkit 21 


as described in Ref. 22 . 


3 Tagging potential of OS charm hadrons 

In events containing a signal B decay, the opposite-side D + , D °, and A+ charm hadrons 
are primarily produced through the quark-level b —)■ c transition. The charge of the 
D + or A+ determines the flavour of the b hadron parent. For D° decays through the 
dominant Cabibbo-favoured process D° —s- K~X, the kaon charge determines the flavour 
of the charm hadron, and thereby that of the parent B hadron (the effect of D° mixing is 
negligible). The OS charm tagging algorithm uses charm hadron candidates reconstructed 
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in a number of decay modes, chosen for their relatively large branching fractions, listed in 
Table [lj These include fully reconstructed hadronic modes with a single charged kaon in 
the final state, partially reconstructed hadronic modes with an unobserved neutral pion, 
and partially reconstructed semileptonic modes. Table [l] also reports the breakdown of the 
charm tagger’s performance by decay mode. The relative rate and relative power of each 
mode are the amounts that it contributes to the algorithm’s total tagging rate e tag an d 
tagging power £ ef j, which are presented in Section [6] and Table [3j The algorithm predicts 
the flavour of the signal B meson using the charge of the kaon in the same manner as the 
OS kaon tagger; however, the selection based on the reconstruction of c hadrons (rather 
than the selection of kaons based on their individual kinematic properties) results in a 
different set of selected kaons and provides a complementary source of tagging information. 

Several effects contribute an irreducible component to the mistag probability for the OS 
charm tagging algorithm. The dominant impact comes from B°-B° oscillation and from 
the contributions of “wrong sign” charm hadrons produced in b —> ccq transitions. The 
impact of Cabibbo-suppressed D° —> K + X decays is negligible, as these typically produce 
additional kaons and do not mimic modes used by the tagging algorithm, and doubly 
Cabibbo-suppressed decays such as D° —>• K + tt~ have a negligibly small branching fraction. 
Accounting for relative production rates of b hadrons, neutral B oscillation, and branching 
fractions of the decay modes used in the tagger, the irreducible mistag probabilities for 
D°, D + and A+ modes are estimated to be 23%, 19%, and 6%, respectively. 

In addition to the irreducible mistag probability arising from physics effects, the charm 
hadron candidates are contaminated by combinatorial and partially reconstructed b and 
c hadron background that can lead to an incorrect flavour tag result. For each mode, 
the charm tagger uses a multivariate algorithm that combines geometric and kinematic 
quantities and properties of the c hadron candidate and its daughters. The resulting 
discriminating variable is used both to suppress the combinatorial background and to 
predict the corresponding mistag probability for the surviving candidate. 


4 Selection of charm candidates 

Charm decay candidates are formed by combining kaon, pion, and proton candidates that 
satisfy particle identification criteria. These particles are required to have momentum 
p > 1000 MeV/c, transverse momentum with respect to the beam axis p^ > 100 MeV/c, and 
to be significantly displaced from any PV. For the candidates in the partially reconstructed 
modes and the decay D° —> K~h + ti + 'ji~ , which contain large combinatorial backgrounds, 
more stringent requirements are imposed on the displacement of the final-state particles 
from the PV. In addition, particles are required to have p t > 150 MeV/c for candidates in 
the mode D° —)■ K. 

Charm hadron candidates are required to pass a number of selection requirements. 
These include a maximum distance of closest approach between each pair of daughter 
tracks and a minimum quality of the decay vertex fit. Each candidate is required to be well 
separated from any PV and to have a trajectory that leads back to the best PV, chosen to 
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Table 1: Decay modes used in the OS charm tagger. The symbol H c stands for any c hadron. 
The definition of the two right-most columns is given in the text. 


Decay mode 

Relative rate 

Relative power 

D° K~tt+ 

10.0% 

24.0% 

D° —>■ K~'K + 'K + 'K~ 

5.9% 

8.4% 

D + ->■ K~7r + TT + 

10.3% 

2.6% 

H c K-tt+X 

69.7% 

61.5% 

H c K~e+X 

0.5% 

0.2% 

H c K~p+X 

3.4% 

0.3% 

At -± P + K~tt + 

0.2% 

2.4% 


be the PV for which the impact parameter significance of the charm hadron is smallest. 
The invariant mass of the charm hadron candidate is required to be consistent with the 
known mass of the corresponding charm hadron, within 100MeV/c 2 for the At channel and 
50MeV/c 2 for all other fully reconstructed D decay modes. For the partially reconstructed 
D —y K~n + X modes, the K~n + mass is required to be in a [—400 MeV/c 2 , +0 MeV/c 2 ] 
window around the known D° mass or in a window of ±50MeV/c 2 around the Ji*(892)° 
resonance. The former is favoured by the invariant mass distribution of K~tt + pairs 
from the quasi-two body decay D° —> K~p + , and the latter selects D —> K*{ 892)°X 
decays. Charm candidates surviving these criteria still contain significant background 
contamination, which must be further reduced in order to lower the mistag probability of 
the algorithm. 

For each mode, an adaptive-boosted decision tree (BDT) [23, 24 


is used both to 

suppress background candidates and to estimate mistag probabilities. The inputs to 
the BDT are variables describing the decay kinematics, decay vertex and displacement, 
and particle identification information on the decay products. A variable related to the 
decay-time is calculated from the distance between the c hadron’s decay vertex and the 
corresponding best PV; this approximates the sum of the decay-times of the c hadron and 
its parent b hadron. The BDT algorithms are trained using Monte Carlo (MC) simulations 
of bb events containing B + —> B° —» JfyK *°, and B G S —» decays on the signal 

side and inclusive decays of the b hadron on the opposite-side. These B decays are used to 
model the various sources and amounts of background when reconstructing OS c hadrons 
recoiling against signal B decays. 

The output of the BDT, along with the simulation record of candidate identification, is 
used to compute the predicted mistag probability p for each c hadron candidate. Candidates 
with r) < 45% are used in the flavour tagging decision. Removing candidates that fail 
this criterion significantly reduces the computing time of the algorithm at little cost to 
tagging performance. In cases where multiple charm candidates are present, the candidate 
with the lowest predicted mistag probability is retained. The combined efficiency of these 
requirements for retaining tagged events is (59.00 ± 0.07)% and (53.4 ±0.3)% in simulation 
and data, respectively. 
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Table 2: Calibration parameters as determined from the B + —> J/iliK + and B° —> J/i/iK* 0 control 
samples. For both calibration modes, the average predicted rnistag probability ( 77 ) is 0.379. The 
first uncertainties are statistical and the second are systematic. The systematic uncertainties are 
evaluated using simulation. 


Sample 

5p 0 (10 3 ) 

Pi 

Ap 0 ( 10 - 3 ) 

Api 

B+ J/0AT+ 

—25 ± 3 ± 3 

1.00 ±0.06 ± 0.02 

15 ± 5 ± 4 

-0.08 ±0.12 ±0.04 

R°—> J/ijjK* 0 

— 18 ± 8 ± 3 

1.16 ±0.17 ±0.02 

23 ± 11 ±4 

0.21 ±0.25 ±0.04 


5 Calibration 


While simulated data are used to develop and optimize the charm tagging algorithm, its 
performance is calibrated with collision data by comparing the algorithm’s predictions to 
the known flavours of signal B candidates, according to the procedure detailed in Ref. |2|. 
The calibration parameters po, pi, A p 0l and A pi are defined by 

uj = Po+Pi (r) - ( 77 )) 

Acu = Ap 0 ± Api ( 7 / - (? 7 )) 

where ( 77 ) is the average predicted rnistag probability, m is the actual rnistag probability 
averaged over B + and B~ signal mesons, and Am is the excess rnistag probability for 
B + mesons with respect to B~ mesons; equivalent definitions hold for B°/B° signal. In 
the ideal case, the offset parameter p 0 should equal (? 7 ), and so the related parameter 
6p 0 = Po ~ (?l) is often more convenient. 

A calibration of the algorithm has been performed using the flavour self-tagged mode 
B + —> J/ip K + . The signal candidates are selected by combining pairs of oppositely charged 
muons, with invariant mass consistent with the known J /0 mass, with charged kaons, 
and are required to pass a set of cuts to obtain a good signal to background ratio [ 21 . 
When multiple candidates are present for a single event, that with the best decay vertex 
fit is kept. A fit to the reconstructed B + mass distribution is used to separate signal 
and background via the sPlot procedure, which computes signal and background weights 
for each candidate 25 . The empirical model for the signal is a sum of two Crystal Ball 


functions [26], while background is modeled by an exponential distribution. A total of 
1.1 x 10 6 signal candidates in this channel are found in the full dataset. The parameters 
p 0 and pi are determined by splitting the data into 13 bins of 77 between 0.19 and 0.45, 
calculating m* and fji (the average 77 ) in each bin, and performing a linear fit to the set of 
values ( LUi,fji ). The calibration parameters Apo and Ap± are obtained from fits to the B + 
and B~ data each split into 5 bins of 77 . The quantities Am* and fji are calculated in each 
of the 5 bins, and a linear fit is performed to the set of values (A 0 ^, 77 *). These fits are 
shown in Figs. [T| and [ 2 ] The resulting calibration parameters are given in Table [ 2 } 

A cross-check of the calibration has been carried out using a B° —)■ Jfip K*° control 
sample. For this calibration, B°-B° oscillation must be taken into account. The Hypatia 
function 27 is used to model the signal’s mass distribution, while the background is 
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Figure 1: Mistag probability w as a function of the predicted mistag probability rj for the 
B + -> J/ij.’K + data sample. A straight line fit to extract the parameters po and p\ is superimposed. 
The dark (green) and light (yellow) bands are the regions within 1 <t and 2a of the fitted value, 
respectively. 


modeled with a sum of two exponential functions. A set of simultaneous fits to the B° 
lifetime distribution in bins of r/ is performed, in which po, Pi, Ap 0 , and A pi are parameters 



ri 


Figure 2: Excess mistag probability Acu as a function of the predicted mistag probability p for 
the B + —> J/ipK+ data sample. A straight line fit to extract the parameters Apo and Api is 
superimposed. The dark (green) and light (yellow) bands are the regions within lcr and 2a of 
the fitted value, respectively. 
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Lifetime t \J/ ijj K + jt ] (ps) 


Figure 3: Raw B°-B° mixing asymmetry (defined in Eq. [ 3 ]) vs. decay-time for the B° —> J/ipK*° 
data sample. The amplitude of the asymmetry is diluted due to mistagging by the charm tagger. 
The mixing asymmetry from the fit is superimposed. 


of the fit model. In each bin, the raw B°-B° mixing asymmetry is defined as 

M(V = V)~M(V±V) 

.".mixing tf( D = 'p) + X(- D £'py W 

where T> is the B meson flavour at decay-time and V is the production flavour predicted 
by the charm tagger. The amplitude of this asymmetry is governed by the actual mistag 
fraction ay in the bin, while the bin’s average predicted mistag probability is fji. The fit 
attempts to match the calibrated value cu(ry) to ay in each bin by adjusting the calibration 
parameters. A projection of the fitted model to the mixing asymmetry is shown in Fig. [3j 
The values of the calibration parameters obtained from the fit are given in Table [2j The 
parameters are compatible with those obtained in the B + —» J/x(jK + mode, with the total 
X 2 per degree of freedom equal to 0.65. 

The relatively small yield of the decay B® —>• D~n + precludes performing a data-driven 
calibration on a B° s mode. Therefore, in order to ensure that the algorithm performs 
similarly for B J? channels as well as B + and B° channels, separate calibrations to simulated 
B + —> J/^/T + , -B 0 —> J/'i/jK* 0 , and B Q S —y J/'ipcf) events are performed. Where statistically 
significant differences between the calibration parameters in the three channels are found, 
a systematic uncertainty, corresponding to half of the maximum difference, is assigned 
to the parameter. These systematic uncertainties are roughly the size of the statistical 
uncertainties for the parameters p 0 and A p 0 , but are negligible for p\ and Api. The 
propagation of these uncertainties results in a 0.011% absolute systematic uncertainty on 
the tagging power, comparable to its statistical uncertainty. 

Other sources of systematic uncertainty on calibration parameters have been investi- 
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Calibrated r\ [J/ ip K + ] 


Figure 4: Distribution of the calibrated predicted mistag probability cu(r/) for the B + —> 
data sample. 


gated and found to have negligible effect. These include the potential effect of the chosen 
model of the invariant B mass distribution for the channel B + —)■ J/p K + . Two alternative 
models of the mass distribution were used and gave nearly identical results. 

There are additional systematic uncertainties related to flavour tagging that must be 
considered in a CP asymmetry analysis. These include differences between the signal chan¬ 
nel sample and calibration channel sample in phase space distribution, event multiplicity, 
number of primary vertices, or other variables. These differences would require corrections 
and would introduce tagging-related systematic uncertainties. Such effects are dependent 
on the signal channel and selection, and must be determined separately for each analysis. 

6 Performance 

The distribution of ij after calibration for the B + —>- J/'pK + control sample is shown 
in Fig. [|] The tagging efficiency, mistag fraction, and the tagging power of the charm 
tagger are reported in Table [3] for the training sample of simulated B —> J/ipX decays and 
for both calibration channels. The propagated statistical uncertainty of the calibration 
parameters dominates the statistical uncertainty of the tagging power. The overall tagging 
power is slightly higher in simulation than in data, due to differences in the distributions 
of input variables. The tagging powers in the two B —> .//?/; X calibration channels are 
consistent. 

Table |3] also reports the tagging metrics for the decays B° —> D~it + and B® —> Dpn + . 
Fits to the mass distributions of the signal candidates are performed to separate signal 
from background. In each fit, the signal is modeled by a sum of two Crystal Ball functions 
and the combinatorial background is described by an exponential function. Several fully 
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Table 3: Tagging efficiencies (stag), effective mistag fractions (u), and tagging powers (e e ff) in the 
various data samples studied. The first uncertainties are statistical and the second are systematic. 
The sample labeled Simulation is the training sample of simulated B + —> J/ipK + , B° — > J/tl’K * 0 , 
and Bg—> decays, which has negligible statistical uncertainties. 


Sample 

Dag 

UJ 

£eff 

Simulation 

4.88% 

37.0% 

0.33% 

B + -)■ J/0AT+ 
5°-> J/ipK*° 
BD~tt+ 

B° s -X D;n+ 

(3.11 ±0.02)% 
(3.32 ± 0.04)% 
(4.11 ±0.03)% 
(3.99 ± 0.07)% 

(34.6 ±0.3 ±0.3)% 
(35.0 ±0.8 ±0.3)% 
(34.4 ±0.4 ±0.3)% 
(34.4 ±0.6 ±0.3)% 

(0.30 ±0.01 ±0.01)% 
(0.30 ±0.03 ±0.01)% 
(0.40 ±0.02 ±0.01)% 
(0.39 ±0.03 ±0.01)% 


and partially reconstructed backgrounds are also modeled in the fit to the B° s —> D~7 r + 
sample. The tagging efficiency for these samples is found to be higher than for the samples 
of B — )■ J/ifjX decays, due to correlations between the kinematics of the signal B and the 
opposite-side charm hadrons. The effective mistag fraction for these samples is consistent 
with that on the B — J/ r tjj X samples. The net effect is an increased tagging power for these 
B — > DX decays, similar to that observed for other opposite-side tagging algorithms 1 7 .28 


To use the charm tagger in a physics analysis, the flavour tagging information from the 
charm tagger can be combined with information from other tagging algorithms. Assessing 
the actual gain in performance depends on the method of combination and calibration, as 
well as on the set of tagging algorithms being combined. Due to correlations with other 
tagging algorithms, in particular the OS kaon and vertex-charge taggers, the maximum 
possible increase in tagging power after the addition of the charm tagging algorithm is less 
than its individual tagging power. The performance of the combination of the current OS 
tagging algorithms with and without the addition of the charm tagger has been measured 
on the B + —s- J/^K + data sample. The absolute net gain in tagging power using the 
current combination algorithm is found to be around 0.11%, compared to the current total 
OS tagging power of about 2.5% 


7 Conclusion 

An algorithm has been developed that determines the flavour of a signal b hadron at 
production time by reconstructing opposite-side charm hadrons from a number of decay 
channels. The flavour tagger uses boosted decision tree algorithms trained on simulated 
data, and has been calibrated and evaluated on data using the self-tagged decay B + —> 
J/ r il>K + . Its tagging power for data in this channel is found to be (0.30 ± 0.01 (stat) ± 
0.01 (syst))%. The calibration has been cross-checked using the decay B° —> J/ijjK *°, giving 
consistent results. The tagging power is found to be higher for the decays B°—> D~tt + and 
D~7 r + , at (0.40 ± 0.02 (stat) ± 0.01 (syst))% and (0.39 ± 0.03 (stat) ± 0.01 (syst))%, 
respectively. 
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